[SPARK-10809] [MLlib] Single-document topicDistributions method for LocalLDAModel by hhbyyh · Pull Request #9484 · apache/spark

hhbyyh · 2015-11-05T02:20:43Z

jira: https://issues.apache.org/jira/browse/SPARK-10809

We could provide a single-document topicDistributions method for LocalLDAModel to allow for quick queries which avoid RDD operations. Currently, the user must use an RDD of documents.

add some missing assert too.

SparkQA · 2015-11-05T03:08:46Z

Test build #45087 has finished for PR 9484 at commit cb5d823.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-11-06T00:50:17Z

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala

Can you please remove the doc ID? It's not necessary for a single doc, and removing it will make this more Java-friendly.

SparkQA · 2015-11-06T08:10:00Z

Test build #45202 has finished for PR 9484 at commit a175ab1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [#9484], but I'll try to merge [#9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9513 from jkbradley/lda-pipelines. (cherry picked from commit e281b87) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [#9484], but I'll try to merge [#9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9513 from jkbradley/lda-pipelines.

This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [apache/spark#9484], but I'll try to merge [apache/spark#9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9513 from jkbradley/lda-pipelines.

jkbradley · 2016-01-06T19:43:27Z

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala

Update this line (no doc ID)

jkbradley · 2016-01-06T19:43:56Z

@hhbyyh Sorry again for the delay, but we can get this merged now

hhbyyh · 2016-01-07T04:15:42Z

@jkbradley It's quite all right. Thanks for reviewing. Update sent.

SparkQA · 2016-01-07T05:05:13Z

Test build #48895 has finished for PR 9484 at commit 9204462.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-07T20:05:40Z

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala

The Scala doc for this line is not generated correctly. Can you try removing the argument and just writing [[topicDistributions]] instead?

hhbyyh · 2016-01-11T08:12:02Z

Sorry for the late response. Update sent

JoshRosen · 2016-01-11T08:48:36Z

Jenkins, retest this please.

SparkQA · 2016-01-11T09:16:50Z

Test build #49109 has finished for PR 9484 at commit 0481c44.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hhbyyh · 2016-01-11T09:27:53Z

Getting many TimeoutException.

SparkQA · 2016-01-11T09:42:52Z

Test build #49124 has finished for PR 9484 at commit 0481c44.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-11T22:55:22Z

LGTM
Merging with master
Thanks!

add predict for single doc

cb5d823

jkbradley reviewed Nov 6, 2015
View reviewed changes

jkbradley mentioned this pull request Nov 6, 2015

[SPARK-5565] [ML] LDA wrapper for Pipelines API #9513

Closed

hhbyyh added 2 commits November 6, 2015 11:10

Merge remote-tracking branch 'upstream/master' into ldaTopicPre

8216dc8

ut update

a175ab1

jkbradley reviewed Jan 6, 2016
View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala Outdated

Copy link

Member

jkbradley Jan 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this line (no doc ID)

hhbyyh added 2 commits January 7, 2016 11:27

Merge remote-tracking branch 'upstream/master' into ldaTopicPre

b444afd

update name and comment

9204462

jkbradley reviewed Jan 7, 2016
View reviewed changes

hhbyyh added 2 commits January 11, 2016 14:59

Merge remote-tracking branch 'upstream/master' into ldaTopicPre

52a2fd9

fix link

0481c44

asfgit closed this in bbea888 Jan 11, 2016

Conversation

hhbyyh commented Nov 5, 2015

Uh oh!

SparkQA commented Nov 5, 2015

Uh oh!

jkbradley Nov 6, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2015

Uh oh!

jkbradley Jan 6, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Jan 6, 2016

Uh oh!

hhbyyh commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 7, 2016

Uh oh!

jkbradley Jan 7, 2016

Choose a reason for hiding this comment

Uh oh!

hhbyyh commented Jan 11, 2016

Uh oh!

JoshRosen commented Jan 11, 2016

Uh oh!

SparkQA commented Jan 11, 2016

Uh oh!

hhbyyh commented Jan 11, 2016

Uh oh!

SparkQA commented Jan 11, 2016

Uh oh!

jkbradley commented Jan 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants